<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
<link rel="stylesheet" href="../style/journal.css" type="text/css" />
<style type="text/css"><!--
.googleadsense {
	margin: 2px;
	padding: 0px;
//--></style><script src="http://www.google-analytics.com/urchin.js" type="text/javascript">
</script>
<script type="text/javascript">
_uacct = "UA-65008-1";
urchinTracker();
</script><title>针对汉字的 Lingua::Han::Utils</title>
</head>
<body>
<a href="index.html">Journal</a>(2005) | <a href="../blog/"><b>Blog</b></a>(2006) | <a href="http://www.fayland.org/cgi-bin/random_link.pl">RandomLink</a> | <a href="AboutFayland.html">WhoAmI</a> | <a href="LiveBookmark.html">LiveBookmark</a> | <a href="http://www.fayland.org/">HomePage</a>
<p><&lt;Previous: <a href="Cantonese.html">Lingua::Han::Cantonese for 广东话</a>&nbsp;&nbsp;>>Next: <a href="Lingua-Han-Stroke.html">汉字比划模块</a></p>
<h1>针对汉字的 Lingua::Han::Utils</h1>
<div class='content'>
<p>Category: <a href='MyCPAN.html'>MyCPAN</a> &nbsp; Keywords: <b>汉字</b></p>这是一个我今天刚写的 module: <a href="http://search.cpan.org/perldoc?Lingua::Han::Utils">Lingua::Han::Utils</a><br />主要用于封装一些我常用的与汉字处理有关的函数。目前封装了四个函数，分别介绍如下：<h3>Unihan_value</h3>返回 Unihan.txt 的第一个字段（出去+U）。 Unihan: <a href="http://www.unicode.org/Public/UNIDATA/">http://www.unicode.org/Public/UNIDATA/</a><br />Unihan 的用处自然不用说，我写的拼音，比划还有广东化都来自这个文件。<p /><pre>use Lingua::Han::Utils qw/Unihan_value/;<br /># Unihan_value<br /># return the first field of Unihan.txt on unicode.org<br />my $word = "我";<br />my $unihan = Unihan_value($word); # return '6211'<br />my $words = "爱你";<br />my @unihan = Unihan_value($word); # return (7231, 4F60)<br />my $unihan = Unihan_value($word); # return 72314F60</pre><h3>cdecode</h3>感谢 <a href="http://www.livejournal.com/users/joe_jiang/">joe jiang</a> 的帮助， <a href="http://search.cpan.org/perldoc?Encode::Guess">Encode::Guess</a> 正好满足要求。<br />一般来说，我们写的代码都有两种情况，一种是 ASCII 编辑模式，一种是在 Unicode 编辑模式下写的。<br /><ul>而不同的模式要 decode 时是不一样的：<br /><li>在 ASCII 模式下，为 decode('euc-cn', $word) 或 decode('gb2312', $word)<br /><li>在 Unicode 编辑模式下，为 decode('utf8', $word)<br /></ul>此模块用 Guess 封装了两者，大家如果要 decode 的话直接使用 cdecode, 而不必考虑在什么模式下。<h3>csplit</h3>用于分割文字。可以是纯中文或中英文混合字。<br /><pre>use Lingua::Han::Utils qw/csplit/;<br />my $words = "我爱你";<br />my @words = csplit($words); # return ("我", "爱", "你")</pre><h3>csubstr</h3>用于截取文字。可以是纯中文或中英文混合字。<br /><pre>use Lingua::Han::Utils qw/csubstr/;<br />my $words = "我爱你啊";<br />my @words = csubstr($words, 1, 2); # return ("爱", "你")<br />my @words = csubstr($words, 1); # return ("爱", "你", "啊")<br />my $words = csubstr($words, 1, 2); # 爱你</pre><h3>clength</h3>将汉字对待成一个单词。<br /><pre>my $words = "我ya爱你";<br />print clength($words); # 5</pre><h2>结论</h2>此模块在去往 CPAN 的路上。如果急着用可以从这下： <a href="http://www.fayland.org/CPAN/">http://www.fayland.org/CPAN/</a></div>
<p><&lt;Previous: <a href="Cantonese.html">Lingua::Han::Cantonese for 广东话</a>&nbsp;&nbsp;>>Next: <a href="Lingua-Han-Stroke.html">汉字比划模块</a></p>
<p><strong>Options:</strong> <a href='http://del.icio.us/post?title=%E9%92%88%E5%AF%B9%E6%B1%89%E5%AD%97%E7%9A%84%20Lingua::Han::Utils&url=http://www.fayland.org/journal/Lingua-Han-Utils.html'>+Del.icio.us</a></p>
<strong>Related items</strong>
<ul><li><a href='ChineseCoding.html'>汉字编码笔记</a> < <span class='digit'>2004-11-23 13:23:11</span> ></li><li><a href='Han2PinYin.html'>将汉字转为拼音的模块</a> < <span class='digit'>2004-12-06 22:36:39</span> ></li><li><a href='Lingua-Han-Stroke.html'>汉字比划模块</a> < <span class='digit'>2005-11-18 20:27:49</span> ></li></ul>
Created on <span class="digit">2005-11-18 18:37:41</span>, Last modified on <span class="digit">2005-11-25 11:45:57</span><br />
Copyright 2004-2005 All Rights Reserved. Powered by <a href="Eplanet.html">Eplanet</a> && <a href='http://catalyst.perl.org'>Catalyst</a> 5.62.
</body>
</html>